Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ThreadId for comparing threads #36341

Merged
merged 5 commits into from
Oct 10, 2016
Merged

Conversation

sagebind
Copy link
Contributor

@sagebind sagebind commented Sep 8, 2016

This adds the capability to store and compare threads with the current calling thread via a new struct, std::thread::ThreadId. Addresses the need outlined in issue #21507.

This avoids the need to add any special checks to the existing thread structs and does not rely on the system to provide an identifier for a thread, since it seems that this approach is unreliable and undesirable. Instead, this simply uses a lazily-created, thread-local usize whose value is copied from a global atomic counter. The code should be simple enough that it should be as much reliable as the #[thread_local] attribute it uses (however much that is).

ThreadIds can be compared directly for equality and have copy semantics.

Also see these other attempts:

And this in the RFC repo: rust-lang/rfcs#1435

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @aturon (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

if THREAD_ID.0 == 0 {
THREAD_ID.0 = 1 + THREAD_ID_COUNT.fetch_add(1, Ordering::SeqCst);
}
THREAD_ID
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation allows for two distinct threads to have a same ID in some cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What case is that?

Copy link
Member

@nagisa nagisa Sep 9, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least

let main_thread_id = ThreadId::current();
loop {
    let thrad2 = thread::spawn(|| { if ThreadId::current() == main_thread_id { println!("it happened") } });
}

To explain it, consider a usize = u16 system. Main thread gets ThreadId(1), then the threads in the loop receive IDs ThreadId(2)...ThreadId(!0). The next thread will wrap the counter assigning ThreadId(1) to the thread, making the still-running main thread and new thread to have colliding IDs. For larger usize systems, the issue will only take longer to manifest itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the only possibility for collision that I thought of. Would it be better to use a u64? If the number of threads you are creating is overflowing a 64 bits (highly unlikely, IMO), then there's not much we can really do, is there? What possible representation could we even use for thread IDs in that case? I can't imagine an OS being able to even handle up to u64-1 threads.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that Java is something I look up to, but here's how Java implements unique thread IDs:

private static synchronized long nextThreadID() {
    return ++threadSeqNumber;
}

My gut intuition (without doing any calculating) is that It would take thousands of years of running for a loop like that spawning threads to overflow 64 bits. Highly unlikely.

Copy link
Member

@nagisa nagisa Sep 9, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the number of threads you are creating is overflowing a 64 bits (highly unlikely, IMO), then there's not much we can really do, is there? … I can't imagine an OS being able to even handle up to u64-1 threads.

You’re assuming that all the threads should be alive at the same time in order for the collision to happen, which is simply not true.

Would it be better to use a u64?

Yes it would. At least then it wouldn’t take a usize = u32 target less than a week of spawning threads continuously to arrive at colliding the thread IDs.

What possible representation could we even use for thread IDs in that case?

Using the address of something unique to a thread in the address space common to all threads would be one way to get guarantee that thread IDs would not collide. (see #29448 (comment) though)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A similar issue exists for reference counted pointers. As pointed out in that discussion, you don't actually need to run a googol increment instructions to overflow, if LLVM optimizes the relevant loop (this is certainly possible for Rc and Arc, I'm not so sure about thread creation, that probably is an observable effect in practice). However, memory safety isn't on the line here, so it's probably okay to slap a footnote on the docs and ignore it. (Using u64 might be better for robustness but you sadly can't use atomics for that.)

@alexcrichton alexcrichton added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Sep 8, 2016
@ranma42
Copy link
Contributor

ranma42 commented Sep 9, 2016

The approach in #29447 seems robust against collisions. Would it be ok to resubmit that branch?
I could try to address the concerns that @alexcrichton raised.

@alexcrichton
Copy link
Member

Thanks for the PR @sagebind! On the technical side of things, I agree with @nagisa that we'll want to handle the overflowing case here somehow if we take this strategy, especially for usize having 32 bits.

An alternative, implementation, however, would be to back this implementation from pthread_self on Unix and GetCurrentThreadId on Windows. That way the implementation here could just call those and equate those return values. That way we wouldn't have to worry about these atomics and overflow.

The only caveat I know of with this implementation, however, is that we'd have to clearly document that equality does not mean the same thread if one of the threads has exited. That is, if two thread ids are equal, they're probably only actually equal if the thread is currently running. I suspect that both Windows and Unix would recycle thread ids after they exit.

@sagebind
Copy link
Contributor Author

@alexcrichton That was the primary reason I thought about avoiding using the native function calls for thread IDs, since the IDs are not guaranteed to be unique during an entire program lifetime; at least that is the case on Windows:

Until the thread terminates, the thread identifier uniquely identifies the thread throughout the system.

I think the bigger problem is pthread_self(3), which is not very useful in and of itself according to the Linux man pages:

POSIX.1 allows an implementation wide freedom in choosing the type used to represent a thread ID; for example, representation using either an arithmetic type or a structure is permitted. Therefore, variables of type pthread_t can't portably be compared using the C equality operator (==); use pthread_equal(3) instead.

We could use pthread_equal(3) to implement comparison between ThreadIds if we store a pthread_t as the ID, but I believe it would be on the heap since the size is implementation-defined. This problem was discussed in #29448.

If we want to use the approach in this PR, is it possible to use AtomicU64 even though it is unstable? This is my first Rust PR, so I'm not familiar with all the ropes yet. 😉

@alexcrichton
Copy link
Member

since the IDs are not guaranteed to be unique during an entire program lifetime

This seems like a desirable restriction though, no? If we could never recycle thread ids then that seems like hard limit to the lifetime of all rust programs in terms of how many threads they can ever produce.

We could use pthread_equal(3) to implement comparison between ThreadIds if we store a pthread_t as the ID

Yeah I was under the impression we'd use pthread_equal to equate two ids here. I'm not sure I follow the point about the heap though? Couldn't we have:

struct ThreadId(libc::pthread_t);

impl PartialEq for ThreadId { ... }

If we want to use the approach in this PR, is it possible to use AtomicU64 even though it is unstable?

That's a possible strategy, yeah, although I don't think AtomicU64 is supported on all platforms? That may end up limiting its usefulness here :(

This is my first Rust PR, so I'm not familiar with all the ropes yet. 😉

Awesome! And of course thanks again, we're willing to help out wherever necessary!

@alexcrichton
Copy link
Member

Discussed at libs triage today the decision was that these seem like good methods to stabilize, with the strategy of using the OS primitives and perhaps exposing them later as well (not necessary as part of this PR). We were also thinking, @sagebind could you add an id(&self) -> ThreadId method to Thread as well to return this?

@ghost
Copy link

ghost commented Sep 27, 2016

Using operating system thread primitives may be problematic for other reasons
than merely possible false positive in equality. In POSIX, thread IDs have a
lifetime, and using them beyond this lifetime is undefined behaviour. In
particular, thread ID lifetime ends after thread is joined, or thread is
detached.

In practice it may not be that bad, given that usual implementation of
pthread_equal is simple pointer comparison.

#[unstable(feature = "thread_id", issue = "21507")]
pub fn current() -> ThreadId {
static THREAD_ID_COUNT: AtomicUsize = ATOMIC_USIZE_INIT;
#[thread_local] static mut THREAD_ID: ThreadId = ThreadId(0);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thread_local!?

isn't #[thread_local] unsupported on some platforms?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding was that the thread_local! macro was implemented using #[thread_local], but that may not be the case.

Since the value is atomic already, the complicated mutexes, keys, and locking provided by thread_local! I felt were unnecessary overhead.

@sagebind
Copy link
Contributor Author

sagebind commented Sep 27, 2016

I'm still working on this PR and how to solve it. Using the system thread IDs seems more robust, since they are managed by the OS and can be re-used forever, but @tmiasko has a good point about lifetimes. Using a pthread_t after terminated is undefined IIRC, and there is no guarantee that no one has a ThreadId that refers to a terminated thread. At best case, the same ID is re-used to refer to a different thread. At worst case, pthread_t is a pointer value no longer pointing to valid data, and throws a segfault.

The struct issue mentioned earlier is not a problem, since interestingly enough pthread_t is currently an alias for one of the following types (depending on platform):

  • usize
  • c_long
  • c_ulong

Windows is much easier (for Vista+); GetThreadId() just returns a DWORD that can be used however we want.

@ranma42
Copy link
Contributor

ranma42 commented Sep 27, 2016

On MacOS X it looks like pthread_t is defined as follows:

struct _opaque_pthread_t {
        long __sig;
        struct __darwin_pthread_handler_rec  *__cleanup_stack;
        char __opaque[__PTHREAD_SIZE__];
};
typedef struct _opaque_pthread_t *__darwin_pthread_t;

(though you could interpret the pointer as a usize)

@sagebind
Copy link
Contributor Author

sagebind commented Sep 27, 2016

@ranma42 Good catch, on the Rust side libc::pthread_t on BSD/OS X is currently defined like so:

pub type pthread_t = ::uintptr_t;

A pointer type, so definitely issues if we call pthread_equal on an invalid pthread_t.

@sagebind
Copy link
Contributor Author

On further experimentation, I don't think we can use the native thread ID facilities and have the flexibility that we want with a ThreadId type. I'm looking at Unix-like specifically here.

Suppose we define ThreadId as the following:

pub struct ThreadId {
    thread: libc::pthread_t,
}

impl PartialEq for ThreadId {
    fn eq(&self, other: &ThreadId) -> bool {
        unsafe {
            libc::pthread_equal(self.thread, other.thread) != 0
        }
    }
}

We want two ways to acquire a ThreadId so that it can be most useful: the ID of the current thread, and the ID of a Thread object. The former can be implemented using pthread_self:

pub fn current_id() -> ThreadId {
    ThreadId {
        thread: libc::pthread_self(),
    }
}

As long as we make ThreadId be !Send, we can guarantee that the ID can never be accessed from another thread or after the thread it represents terminates. But this has almost no use case, since we can't get the ID of any other thread and the only thing we can ever compare it to is itself. So let's say we store a copy of ThreadId in Thread, so the pthread_t is in Thread as well as JoinHandle. Then we can have our Thread::id() method.

But now we have a problem with the lifetime of pthread_t. Consider this example:

let join_handle = thread::spawn(|| {
    thread::current()
});

let dead_thread = join_handle.join().unwrap();
println!("Thread name: {:?}", dead_thread.name());
assert!(thread::current().id() != dead_thread.id());

What should this code do? If we simply use the pthread_t stored in dead_thread we could have a completely invalid ID. That's the best case. The worst case is that pthread_t is implemented as a pointer type and the target memory has been freed, and we get a segmentation fault. JoinHandle was designed this way to prevent this kind of thing from happening using proper lifetime management that matches the native resource.

So I am currently of the opinion that we should avoid using the system thread IDs for this. By the time we change our resource lifetimes for the Thread and ThreadId types so that we guarantee safety, the types will be so limited that we can't even use them in the problems we were trying to solve in the first place.

I attempted to study the source for other languages to see how they solve this problem, and I generally saw two approaches:

  1. Provide some sort of getHandle() or getId() method on a thread that returns whatever system thread value is currently being used. No guarantees are made that you can actually do anything with it (or that it is still allocated), only that it is the native ID.
  2. Provide a custom ThreadID class that represents a unique ID for that thread. In every implementation I read this value was determined internally and did not use the system thread handle. This is the approach taken in Java, C#, Haskell, D, and Nim.

C# actually provides both ID types, and I think that might be the strategy we may want to take (in separate PRs). Provide an opaque ID for comparison's sake in this PR, and provide a way to return the HANDLE or pthread_t from a JoinHandle for arbitrary usage (maybe integrating with some C code).


Now to solve the integer overflow problem, I'm thinking the most efficient way that doesn't leak is with atomics, though I'm not a veteran in Rust's atomics. Is it reasonable to assume that most platforms have support for AtomicU32? Then the ID could just be an (AtomicU32, AtomicU32) pair. Otherwise we have to use a mutex. Ugh

Or maybe a spinlock?

@alexcrichton
Copy link
Member

Is it really undefined behavior to call pthread_equal on a pthread_t even if the thread has terminated? I could imagine that something like pthread_join is undefined behavior, but pthread_equal? @tmiasko could you point me at the docs?

@ghost
Copy link

ghost commented Sep 28, 2016

Thread IDs:

The lifetime of a thread ID ends after the thread terminates if it was created with the detachstate attribute set to PTHREAD_CREATE_DETACHED or if pthread_detach() or pthread_join() has been called for that thread. A conforming implementation is free to reuse a thread ID after its lifetime has ended. If an application attempts to use a thread ID whose lifetime has ended, the behavior is undefined.

pthread_equal:

The pthread_equal() function shall return a non-zero value if t1 and t2 are equal; otherwise, zero shall be returned.

If either t1 or t2 are not valid thread IDs, the behavior is undefined.

Though, I am yet to see an implementation that is different from following one:

int pthread_equal(pthread_t t1, pthread_t t2) {
  return (t1 == t2);
}

Edit: Another data point, implementation for Python just casts thread IDs as integers
(later to compare thread IDs, it would just compare integers):

/* XXX This implementation is considered (to quote Tim Peters) "inherently
   hosed" because:
     - It does not guarantee the promise that a non-zero integer is returned.
     - The cast to long is inherently unsafe.
     - It is not clear that the 'volatile' (for AIX?) are any longer necessary.
*/
long
PyThread_get_thread_ident(void)
{
    volatile pthread_t threadid;
    if (!initialized)
        PyThread_init_thread();
    threadid = pthread_self();
    return (long) threadid;
}

@sagebind
Copy link
Contributor Author

sagebind commented Sep 28, 2016

FreeBSD at least defines pthread_t as a memory address, and even uses that implementation of pthread_equal. Interesting.

// freebsd/sys/sys/_pthreadtypes.h
typedef struct  pthread         *pthread_t;
// freebsd/lib/libthr/thread/thr_equal.c
int
_pthread_equal(pthread_t t1, pthread_t t2)
{
    /* Compare the two thread pointers: */
    return (t1 == t2);
}

In theory we would be safe by using pthread_equal, but the POSIX standard can't guarantee that.

@Amanieu
Copy link
Member

Amanieu commented Sep 28, 2016

I've been using the thread_id crate for some of my code, in particular for my thread_local crate. thread_id returns the id of the current thread as a usize, by calling pthread_self and GetCurrentThreadId.

thread_local uses this value in a AtomicUsize to track values belonging to different threads. Empty slots are represented using a value of 0 since on Windows and all currently existing unix systems, the thread id will never be 0.

@Amanieu
Copy link
Member

Amanieu commented Sep 28, 2016

@tmiasko As far as I know the only pthread implementation that doesn't use a pointer for pthread_t is pthread-win32 (https://github.com/GerHobbelt/pthread-win32/blob/master/pthread.h#L575).

(Note that nowdays most people use winpthreads instead of pthread-win32, which uses a simple pointer for pthread_t)

@alexcrichton
Copy link
Member

Thanks for the investigation @tmiasko and @sagebind! The Python implementation is interesting as it seems to violate a super strict reading of the spec, but then again the fact that all but one implementation just uses integers makes me think that it's the pragmatic thing to do anyway.

Given all that I'm tempted to ignore a pedantic interpretation of the spec (especially given the precedent) and go ahead with an implementation detail of pthread_t and DWORD under the hood. We won't expose them just yet, and that also still buys us room to explore a different implementation strategy in the future.

@alexcrichton
Copy link
Member

Ok, got a chance to talk about this during libs triage today. Our conclusion was that due to these limitations and implementations elsewhere we should just go with a u64 everywhere and wrap in a Mutex for now for synchronized access.

@sagebind would you be up for implementing this?

@sagebind
Copy link
Contributor Author

sagebind commented Oct 4, 2016

@alexcrichton Sounds like a plan. I can implement this in my next iteration and will push it to this PR (I've got like 3 versions now 😀).

I've also developed an algorithm for an atomic incrementing (usize, usize, ...) while experimenting, but I don't know if that is useful here. Only way to guarantee at least 64 bits of growth is a (usize, usize, usize, usize) which may be larger than we want on a 64 bit system.

).is_err() {
// Give up the rest of our thread quantum if another thread is
// using the counter. This is the slow_er_ path.
yield_now();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of a busy loop, could this just use a mutex for now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the best way would be to set up a mutex statically. Isn't StaticMutex gone now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the context of libstd you can reach into sys_common::mutex::Mutex

Copy link
Contributor Author

@sagebind sagebind Oct 5, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like more time would be spent in the mutex syscalls than actually using the counter -- doesn't seem worth it. Using a spinlock, at least the normal case of no contention uses no syscalls.

There's no defined time where we could call Mutex::destroy(), since a thread could be created at any time.

ONCE uses a similar atomics approach as this if I recall for the same reasons.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ok to in general not call Mutex::destroy, and otherwise I don't think there's going to be any real impact here on a mutex vs a spinlock. While Once uses atomics, it doesn't use spinlocks internally.

In general we try to avoid spinlocks wherever possible as there are generally more suitable synchronization primitives.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like more time would be spent in the mutex syscalls than actually using the counter

Most of the relevant systems have fast userspace implementations which get used in low-contention conditions. Please use Mutex or whatever based on Mutex in order to avoid rolling out your own.

// If we somehow use up all our bits, panic so that we're not
// covering up subtle bugs of IDs being reused.
if COUNTER == ::u64::MAX {
panic!("failed to generate unique thread ID: bitspace exhausted");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This'll want to unlock the lock before the thread panics

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I'll fix that in my next commit.

@alexcrichton
Copy link
Member

@bors: r+

@bors
Copy link
Contributor

bors commented Oct 9, 2016

📌 Commit 032bffa has been approved by alexcrichton

@bors
Copy link
Contributor

bors commented Oct 10, 2016

⌛ Testing commit 032bffa with merge 6d62084...

bors added a commit that referenced this pull request Oct 10, 2016
Add ThreadId for comparing threads

This adds the capability to store and compare threads with the current calling thread via a new struct, `std::thread::ThreadId`. Addresses the need outlined in issue #21507.

This avoids the need to add any special checks to the existing thread structs and does not rely on the system to provide an identifier for a thread, since it seems that this approach is unreliable and undesirable. Instead, this simply uses a lazily-created, thread-local `usize` whose value is copied from a global atomic counter. The code should be simple enough that it should be as much reliable as the `#[thread_local]` attribute it uses (however much that is).

`ThreadId`s can be compared directly for equality and have copy semantics.

Also see these other attempts:
- #29457
- #29448
- #29447

And this in the RFC repo: rust-lang/rfcs#1435
@alexcrichton
Copy link
Member

@bors: retry force clean

1 similar comment
@alexcrichton
Copy link
Member

@bors: retry force clean

@alexcrichton alexcrichton merged commit 032bffa into rust-lang:master Oct 10, 2016
@alexcrichton
Copy link
Member

This passed on most platforms and then buildbot got stuck, merging manually.

@vitiral
Copy link
Contributor

vitiral commented Oct 18, 2016

I'm confused about using a u64 -- doesn't that still put an "upper limit" on the number of threads that can ever exist in the life of a rust program? Obviously it will wrap, but then you won't be guaranteed that thread_1.thread_id != thread_2.thread_id

@sfackler
Copy link
Member

How long would you estimate it take to create 19 quintillion threads?

@sfackler
Copy link
Member

Put more concretely, even creating one thread every nanosecond, it would take 584 years to overflow the thread ID.

@vitiral
Copy link
Contributor

vitiral commented Oct 18, 2016

ah yes, I should have done the math. If you created and destroyed 1000 threads per second it would take 584 million years to overflow the value:

2 ** 64 / 60 / 60 / 24 / 365 / 1000 == 584942417

That would be quite the rust program!

Or, as you said one per nanosecond would take 584 years

@vitiral
Copy link
Contributor

vitiral commented Oct 18, 2016

This could be a problem when we have 10 million core computers that run programs for hundreds of years. I think we can safely say we are far enough away from that.

@alexcrichton
Copy link
Member

Also note that this is just an implementation detail currently, we don't expose the u64 itself.

@Amanieu
Copy link
Member

Amanieu commented Oct 19, 2016

Rayon uses a neat trick to get a non-zero usize thread id by taking the address of a thread-local variable.

@ranma42
Copy link
Contributor

ranma42 commented Oct 19, 2016

@alexcrichton the overflow is an implementation detail that would be trivial to change, but the current implementation also guarantees that no id will be re-used ever. Users of the API might end up relying on this even though the documentation guarantees that the id is only unique among running threads.

@sagebind
Copy link
Contributor Author

@ranma42 I think maybe the docs should change to also guarantee that IDs will not be reused; that's a strong guarantee, and one that makes thread IDs more useful in my opinion.

@Amanieu
Copy link
Member

Amanieu commented Oct 19, 2016

I disagree: I think the implementation should allow thread IDs to be reused since this allows a more efficient implementation of thread IDs in the future. This matches the way every other thread ID implementation works (Windows thread IDs, pthread_t with pthread_equals, Java Thread.getId(), etc all do not make this guarantee).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.